Notes

In [1]:
# This section is a bunch of notes that I'm writing for myself. You needn't read them. Skip to the introduction.

# TODO: This will be the delegated notebook!


# FRI JAN 8 TODO:
# TODO: Create symlinks in this directory for the dataset so that it can be exported to other people more easily
# TODO: Clean up the names of things in this notebook
# TODO: Move all these functions to other files and create a delegation notebook to test them
# TODO: Create the standard deviation measurement from multiple translations, and create a visualization for that deviation in this notebook (with some tests, of course)
# class ViewConsistencyVarianceLoss: def __init__(self, tex_width, tex_height, num_labels, pyramid_weights=[1,1,1])
#    def forward(self, scene_uvs, scene_translations)
# TODO: Smooth moving cube blender animation for demos
# FOR EXPERIMENT: With pure simulated data, like textured cube in blender that moves around, we could use mean squared error for measuring how good each method is!
# TODO: Figure out why the table is so blurry in the naive reconstructions. Is this because the MUNIT is randomly shifting the result image? It seems to be a discerete blur, in that a few shifts are averaged together...
# NOTE: it might be beneficial to use multiple values of recovery_resolution in the view consistency loss; because that way it can criticize both high and low detail scales. This can be done with multiple ViewConsistencyLoss objects; perhaps aggregated into a MultiScaleViewConsistencyLoss(nn.Module) class. 
# NOTE: Uses of this might be for: reinforcement learning with multiple cameras, mobile robots, reinforcement learning with some temporal memory beetween frames and/or using optical flow. Might also be useful for data augmentation for image segmentation and object detection tasks?
# TODO: Once we get the thing working, can we then use the vid2vid to bake textures onto objects?

#TODO: Try using the crummy recovered textures from averaging the cyclegan outputs as the initial texture instead of random noise. Also try using a network that returns its own input as an initial neural networkk.


#IDEA: There's also a texture that gets evolved. The texture is pushed to match the output of teh translations, but the translations are only pushed to match the hue of that texture. That way it doesn't hold back the translation network **too** much but shuold remain non-blurry. Or better yet, the translation network is only responsible for shading and the texture does the rest...

#HOW TO INTEGRATE:
#    turn the learned-neural-rendered-image-projection thing into a DataLoader class, and substitute that in for the current dataloader for the MUNIT algorihtm.
#    then, just add the consistency loss. Do


#RENAMINGS: textures, weights becomes texture_pack, weight_pack
#RENAMINGS: num_labels becomes num_textures
#TODO: Define what a texture and scene are, with pictures ov UV scenes etc.

Imports and Setup

Imports

In [2]:
from rp import *
import torch
import icecream

#Install packages if needed:
pip_import('einops');
pip_import('torch' );

Config

In [3]:
%config InlineBackend.figure_format='retina'
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

icecream.ic(device);
ic| device: device(type='cuda')

Helper Functions

In [4]:
def display_images(images):
    if isinstance(images,torch.Tensor):
        images=as_numpy_images(images)
    display_image(tiled_images(images))
In [5]:
def as_numpy_image(image):
    if isinstance(image,np.ndarray):
        return image.copy()
    else:
        return as_numpy_images(image.unsqueeze(0))[0]
In [6]:
def resize_images(images,size,interp='bilinear'):
    return [cv_resize_image(image,size,interp) for image in images]

Introduction

In [7]:
print("This is a photo: the target domain is an alphabet block")
photo_image=as_rgb_image(load_image(random_element(get_all_files('datasets/alphacube/photos'))))
icecream.ic(photo_image.shape)
display_image(photo_image)

print("This is a 'Scene': A picture of a 3d model's UV map (red/green), and blue 'label' channel indicating whats what")
scene_image=as_rgb_image(load_image(random_element(get_all_files('datasets/alphacube/scenes'))))
icecream.ic(scene_image.shape)
display_image(scene_image)

print("This is an example of a 'Texture', an image that gets applied to UV maps for a particular label. ")
print("This particular texture gets applied to the alphabet cube. Its a bit blurry because it was recovered from data.")
print("Note that in my code, we use square textures. This is an arbitrary choice; they don't have to be.")
display_image(load_image('assets/texture_example.png'))

print("The goal of this tutorial is to show you how some of the functions in this project are used, giving you")
print("a more visual intuition for this project as a whole")
ic| photo_image.shape: (566, 812, 3)
This is a photo: the target domain is an alphabet block
This is a 'Scene': A picture of a 3d model's UV map (red/green), and blue 'label' channel indicating whats what
ic| scene_image.shape: (566, 812, 3)
This is an example of a 'Texture', an image that gets applied to UV maps for a particular label. 
This particular texture gets applied to the alphabet cube. Its a bit blurry because it was recovered from data.
Note that in my code, we use square textures. This is an arbitrary choice; they don't have to be.
The goal of this tutorial is to show you how some of the functions in this project are used, giving you
a more visual intuition for this project as a whole

Projection / Unprojection

Prepare example data

In [8]:
image_paths=get_all_files('datasets/alphacube/scenes',sort_by='number')
image_paths=image_paths[:16] #For the previews, limit the number of samples. It makes the .ipynb files smaller.
cube_models=load_images(image_paths,show_progress=True,use_cache=True)
cube_models=[as_float_image(cube_model) for cube_model in cube_models]
cube_models=as_numpy_array(cube_models)
print("A random cube model:")
display_images(cube_models[:4])
rp.load_images: Done! Loaded 16 images in 0.204 seconds
A random cube model:
In [9]:
stone='https://www.filterforge.com/filters/12449.jpg'
tiles='https://filterforge.com/filters/10857-v4.jpg'
wood ='https://filterforge.com/filters/8892.jpg'
paved='https://filterforge.com/filters/14157.jpg'
metal='https://www.filterforge.com/filters/1375.jpg'
gears='https://www.filterforge.com/filters/8624.jpg'
walls='https://www.filterforge.com/filters/15245.jpg'
grass='https://www.filterforge.com/filters/11635.jpg'
china='https://www.filterforge.com/filters/9935.jpg'


#Go ahead and modify this notebook here: choose your favorite two textures!
#The first one goes to the cube, and the second one goes to the table.
albedo       =china
second_albedo=wood

albedo       =load_image(albedo       ,use_cache=True)
second_albedo=load_image(second_albedo,use_cache=True)


#Display the images:
ims=load_images([stone,tiles,wood,paved,metal,gears,walls,grass,china],use_cache=True)
ims=resize_images(ims,.25)
ims=labeled_images(ims,'stone,tiles,wood,paved,metal,gears,walls,grass,china'.split(','))
ims=tiled_images(ims)
print("Texture options:")
display_image(ims)

print("Albedo Map:")
display_image(albedo)
icecream.ic(albedo.shape)

print("Second Albedo Map:")
display_image(second_albedo)
icecream.ic(second_albedo.shape)


#Create the torch tensors:
torch_cube_models=as_torch_images(cube_models).to(device)

torch_albedo       =torch.tensor(albedo       ).to(device).permute(2,0,1)/255
torch_second_albedo=torch.tensor(second_albedo).to(device).permute(2,0,1)/255
Texture options:
Albedo Map:
ic| albedo.shape: (512, 512, 3)
Second Albedo Map:
ic| second_albedo.shape: (512, 512, 3)

Important Utility Functions

In [10]:
from source.scene_reader import extract_scene_uvs_and_scene_labels
In [11]:
scene_uvs, scene_labels = extract_scene_uvs_and_scene_labels(torch_cube_models,[0,255])

icecream.ic(scene_labels.flatten().unique())

icecream.ic(torch_cube_models.shape,
            scene_uvs        .shape,
            scene_labels     .shape);
ic| scene_labels.flatten().unique(): tensor([0, 1], device='cuda:0')
ic| torch_cube_models.shape: torch.Size([16, 4, 566, 812])
    scene_uvs        .shape: torch.Size([16, 2, 566, 812])
    scene_labels     .shape: torch.Size([16, 566, 812])
In [12]:
from source.projector import colorized_scene_labels
In [13]:
print("Colorized with arbitrary colors, such as blue and pink...")
colorized_labels = colorized_scene_labels(scene_labels, torch.Tensor([[1,0,.5],[0,.25,.5]]))
display_images(colorized_labels[:4])

print("Colorized with more arbitrary colors, such as black and green...")
colorized_labels = colorized_scene_labels(scene_labels, torch.Tensor([[0,1,0],[0,0,0]]))
display_images(colorized_labels[:4])
Colorized with arbitrary colors, such as blue and pink...
Colorized with more arbitrary colors, such as black and green...

Projection

Functions

In [14]:
from source.projector import project_textures

Demo 1: Albedo and Second Albedo (Arbitrary textures)

In [15]:
textures=torch.stack((torch_albedo, torch_second_albedo))

icecream.ic(textures.shape)

scene_projections = project_textures(scene_uvs, scene_labels, textures)
print("Rendered images from torch: should look identical to the previous animation on every frame")
display_images(as_numpy_images(scene_projections[:4]))
ic| textures.shape: torch.Size([2, 3, 512, 512])
Rendered images from torch: should look identical to the previous animation on every frame

Unprojection

Functions

In [16]:
from source.unprojector import unproject_translations, unproject_translations_individually

Demo 1: Albedo and Second Albedo

In [17]:
num_labels=len(textures)
recovery_resolution=1024
# recovery_resolution=512
# recovery_resolution=256
recovered_textures, _ = unproject_translations(scene_projections                ,
                                               scene_uvs                        ,
                                               scene_labels                     ,
                                               num_labels                       ,
                                               output_height=recovery_resolution,
                                               output_width =recovery_resolution)
In [18]:
print("Unprojection mean:")
display_images(recovered_textures)
w=torch.stack((_,_,_),dim=1)
w=w/w.max(dim=1,keepdim=True)[0].max(dim=2,keepdim=True)[0].max(dim=3,keepdim=True)[0]
print("Unprojection weights:")
display_images(w)
Unprojection mean:
Unprojection weights:
In [19]:
frames=[]

for scene_uv, scene_label, scene_projection, cube_model in zip(scene_uvs, scene_labels, scene_projections, cube_models):
    # recovery_resolution=1024
    recovery_resolution=512
    # recovery_resolution=256
    recovered_textures, _ = unproject_translations(scene_projection[None]           ,
                                                   scene_uv        [None]           ,
                                                   scene_label     [None]           ,
                                                   num_labels                       ,
                                                   output_height=recovery_resolution,
                                                   output_width =recovery_resolution)
    
    scene_projection   = as_numpy_image (scene_projection  )
    recovered_textures = as_numpy_images(recovered_textures)
    
    scene_width = get_image_width(scene_projection)
    assert get_image_width(cube_model) == scene_width
    
    scene_stuff = [scene_projection, cube_model]
    scene_stuff = resize_images (scene_stuff, recovery_resolution/scene_width          )
    scene_stuff = labeled_images(scene_stuff, ['Scene Projection', 'UV Map and Labels'])
    
    recovered_textures = labeled_images(recovered_textures, ['Recovered Albedo','Recovered Second Albedo'])
    
    frame = grid_concatenated_images([recovered_textures, scene_stuff])
    
    frames.append(frame)

display_image_slideshow(frames)

Demo 2: Unprojecting Naive Image Translations

In [20]:
naive_data_nonrandom       =load_image('./assets/naive_translation_samples_nonrandom.png'       );num_samples=16
naive_data_nonrandom_nerfed=load_image('./assets/naive_translation_samples_nonrandom_nerfed.png');num_samples=32

naive_data=naive_data_nonrandom_nerfed #This one gave the most crisp results

naive_data=as_rgb_image(as_float_image(naive_data))
# Note: This naiva data is loaded from a png with 1 byte per color channel,
# so it's UV values are rounded into 256 positions 
# Note: There are only 16 samples. That's ok - this isn't a dataset. It's the results of an image-to-image
# translation algorithm, that's naive to the semantics of what the U,V values mean. In other words, that
# simple image-to-image translation algorithm is naive to the 3d information about the cube and table.

print('Naive UV/Translations:')
display_image(naive_data)

naive_scene_uv_and_labels, naive_scene_tranlations = split_tensor_into_regions(naive_data, 2 , num_samples, flat=False)

icecream.ic(naive_data.min(), naive_data.max(), naive_scene_uv_and_labels.shape, naive_scene_tranlations.shape);

print("Four random image translation results, zoomed in")
display_images(random_batch(naive_scene_tranlations,4))
Naive UV/Translations:
ic| naive_data.min(): 0.0
    naive_data.max(): 1.0
    naive_scene_uv_and_labels.shape: (32, 256, 256, 3)
    naive_scene_tranlations.shape: (32, 256, 256, 3)
Four random image translation results, zoomed in
In [21]:
junk_label=77 #Some arbitrary unused label value: this is to get rid of the streaks

torch_naive_scene_uv_and_labels = as_torch_images(naive_scene_uv_and_labels).to(device)
torch_naive_scene_tranlations   = as_torch_images(naive_scene_tranlations  ).to(device)
torch_naive_scene_uvs, torch_naive_scene_labels = extract_scene_uvs_and_scene_labels(torch_naive_scene_uv_and_labels,
                                                                                     label_values=[junk_label,0,255])

icecream.ic(torch_naive_scene_tranlations.shape,torch_naive_scene_tranlations.min(),torch_naive_scene_tranlations.max(),
            torch_naive_scene_uvs        .shape,torch_naive_scene_uvs        .min(),torch_naive_scene_uvs        .max(),
            torch_naive_scene_labels     .shape,torch_naive_scene_labels     .min(),torch_naive_scene_labels     .max());
ic| torch_naive_scene_tranlations.shape: torch.Size([32, 3, 256, 256])
    torch_naive_scene_tranlations.min(): tensor(0., device='cuda:0')
    torch_naive_scene_tranlations.max(): tensor(0.9961, device='cuda:0')
    torch_naive_scene_uvs        .shape: torch.Size([32, 2, 256, 256])
    torch_naive_scene_uvs        .min(): tensor(0., device='cuda:0')
    torch_naive_scene_uvs        .max(): tensor(0.9882, device='cuda:0')
    torch_naive_scene_labels     .shape: torch.Size([32, 256, 256])
    torch_naive_scene_labels     .min(): tensor(0, device='cuda:0')
    torch_naive_scene_labels     .max(): tensor(2, device='cuda:0')
In [22]:
recovery_resolution=1024  #Try out different recovery resolutions! You'll see why it's sometimes best to leave it small.
recovery_resolution=512
recovery_resolution=256
# recovery_resolution=128
number_of_naive_samples=1 #As this number increases, it will become blurrier but get more coverage
number_of_naive_samples=4 
number_of_naive_samples=32 
recovered_textures, recovered_weights = unproject_translations(torch_naive_scene_tranlations[:number_of_naive_samples],
                                                               torch_naive_scene_uvs        [:number_of_naive_samples],
                                                               torch_naive_scene_labels     [:number_of_naive_samples],
                                                               num_labels   =3                                        ,
                                                               output_height=recovery_resolution                      ,
                                                               output_width =recovery_resolution                      )

display_images(recovered_textures)

Speculation Note: "Blurryness"

A question: Why are the textures so blurry? My guess: Look a bit closer on the floor texture. There's absolutely no reason this should be blurry - the image translation algorithm gets the table right almost perfectly (because it's a non-moving object that's in exactly the same place in every image). Take a look at naive_data, and you'll see the table is a lot more crisp. In particular, however, it seems that it's been shifted only up and down: a vertical blur. First of all, this blur is becasue of the averaging operation in the unprojection function: it aggregates all textures extracted from all scenes. This means that some of the tables appeared shifted left and right randomly in the image translations relative to the UV inputs.

I suspect this has something to do with the data augmentation used during the naive image translation training. I'll have to look into this more later.

Note: "Four Pixels"

Observe that the dots scattered around aren't 1 pixel wide - they're four pixels wide. This is because of the calculate_subpixel_weights(...)'s usage in the unprojection function. This makes visible areas of textures more likely to collide from different views, making a better view consistency loss down the line.

In [23]:
naive_reprojections=project_textures(torch_naive_scene_uvs, torch_naive_scene_labels, recovered_textures)

naive_reprojections=as_numpy_images(naive_reprojections)

print("Naive reprojections:")
display_images(naive_reprojections)
Naive reprojections:
In [24]:
naive_to_big_reprojections=project_textures(scene_uvs, scene_labels+1, recovered_textures)
naive_to_big_reprojections=as_numpy_images(naive_to_big_reprojections)

print("Naive Reprojections:")
display_images(naive_to_big_reprojections[:4])
display_image_slideshow(resize_images(naive_to_big_reprojections,1/1))
Naive Reprojections:

Demo 3: Individual Unprojections

In [25]:
recovery_resolution = 256
recovered_texture_packs, recovered_weight_packs = unproject_translations_individually(torch_naive_scene_tranlations    ,
                                                                                      torch_naive_scene_uvs            ,
                                                                                      torch_naive_scene_labels         ,
                                                                                      num_labels   =3                  ,
                                                                                      output_height=recovery_resolution,
                                                                                      output_width =recovery_resolution)
In [26]:
print("Individually recovered junk textures:")
display_images(recovered_texture_packs[:,0])

print("Individually recovered cube textures:")
display_images(recovered_texture_packs[:,1])

print("Individually recovered table textures:")
display_images(recovered_texture_packs[:,2])

print("Individually recovered cube textures with maximum filter (purely for visual purposes):")
display_image(min_filter(max_filter(tiled_images(as_numpy_images(recovered_texture_packs[:,1])),diameter=3),diameter=3))
Individually recovered junk textures:
Individually recovered cube textures:
Individually recovered table textures:
Individually recovered cube textures with maximum filter (purely for visual purposes):
In [27]:
print("Some individually recovered textures, along with their source images")

num_display_samples=9

display_image(grid_concatenated_images([as_numpy_images(recovered_texture_packs[:num_display_samples,0]),
                                        as_numpy_images(recovered_texture_packs[:num_display_samples,1]),
                                        as_numpy_images(recovered_texture_packs[:num_display_samples,2]),
                                        naive_scene_uv_and_labels              [:num_display_samples  ] ,
                                        naive_scene_tranlations                [:num_display_samples  ] ]))
Some individually recovered textures, along with their source images

Speculation Note: "Streaks"

The strange streaks appearing in the 'junk' textures from the top left corner are a result of the smooth edges around the black circles in the UV maps, and the other distortions are from the cube/table boundaries in the UV parts of the naive_data images. Because the edges aren't crisp, there's a blend between the UV's of the table and the cube, resulting in UV values that don't actually exist, and don't correspond to any object. Because they're just on the edges of objects, I suspect they'll have minimal impact on the total image texture.

In addition, you might ask: "Ok, that makes sense, but the blue values should also be interpolated; resulting in a label value that also doesn't exist. Shouldn't that mean it skips those points?" And the answer is that it doesn't skip them but defaults to label #0: the junk texture. That might be fixed in the future, but for now I believe that this won't happen in the actual use-case this unprojector will be used in: as a data preprocessor for the image-to-image translation algorithm. Right now, the defualt label when we don't know what label to give is 0. That's why all the artifacts are in the top texture image (texture number zero).

This all can (and will) be fixed by using better UV maps: crisp ones with no antialiasing. However, for the purposes of this tutorial, it doesn't really matter.

Note: "Non-Blurryness"

This note is to help with a previous speculation note, "Blurryness". Note how since now we're only recovering textures from an individual scene, the table is no longer blurry. The table only becomes blurry when we're recovering a single texture from multiple scenes at once. Also notice how the cube's texture isn't as blurry.

View Consistency Loss

In [28]:
### Weighted Mean/Variance
In [29]:
from source.view_consistency import weighted_variance
In [30]:
print("Variance of the individually recovered junk textures:")
display_image(full_range(as_numpy_image(weighted_variance(recovered_texture_packs[:,0],recovered_weight_packs[:,0]))))

print("Variance of the individually recovered cube textures:")
display_image(full_range(as_numpy_image(weighted_variance(recovered_texture_packs[:,1],recovered_weight_packs[:,1]))))

print("Variance of the individually recovered table textures:")
display_image(full_range(as_numpy_image(weighted_variance(recovered_texture_packs[:,2],recovered_weight_packs[:,1]))))
Variance of the individually recovered junk textures:
Variance of the individually recovered cube textures:
Variance of the individually recovered table textures:

Here, we're displaying the variance of the first two sets of images displayed in Unprojection Demo 3. Note how there's a lot of disagreement about where the striped tape should be (which is why it's so blurry). Because of this, those areas have a high variance.

This variance will be used as a "View Consistency Loss": the neural network and neural texture will be learned to try and minimize this view inconsistency (measured by variance in recovered textures). In other words, we want to make the above and below pictures dimmer.

Also, note that I'm using the variance instead of the standard deviation. That might change in the future; it really depends on what kind of results I get. I'm note sure what the best loss is - but I suspect variance is kinda like MSE from the mean, so maybe it will make a good loss function. I'll probably end up trying both losses though.

For comparison, I'll show what the standard deviation looks like below:

In [31]:
print("Standard Deviation of the individually recovered junk textures:")
display_image(full_range(as_numpy_image(weighted_variance(recovered_texture_packs[:,0],recovered_weight_packs[:,0])**.5)))

print("Standard Deviation of the individually recovered cube textures:")
display_image(full_range(as_numpy_image(weighted_variance(recovered_texture_packs[:,1],recovered_weight_packs[:,1])**.5)))

print("Standard Deviation of the individually recovered table textures:")
display_image(full_range(as_numpy_image(weighted_variance(recovered_texture_packs[:,2],recovered_weight_packs[:,2])**.5)))
Standard Deviation of the individually recovered junk textures:
Standard Deviation of the individually recovered cube textures:
Standard Deviation of the individually recovered table textures: